DOCOHBNT BESDHE 



BO 109 064 SP 009 310 

&UTHOB Acland, Henry 

TITLE A Study of Teacher Effects Based on Students* 

Achleveient Scores. 
POB DATE 75 
NOTE 37p. 

EDBS PBICE HF-$0.76 HC-$1.95 FLOS POSTAGE 

DfiSCBlPTOBS ♦Acadeiic Achieveient; ♦Effective Teaching; 

Elementary Education; . Grade \5; Student Evaluation; 

Teacher Behavior; *Teacher Influence; Teaching 

IDBHTIPIEBS / ♦Metropolitan Achieveaent Test (BAT) 
ABSTBiCT 

This ireport tests the assuiption that tethers have 
an lipact on hov inch students learn. The xesults of this^^^udy 
indicate /that teachers have an effect on average class achiev^ient 
scores, and that this effect can be broken down into a stable \^ 
coiponent attributed 'to the teachers* consistency, and an unstablV..^ 
effect vliich varies fro» year to year. The stable component can be 
obtained by leasuring (a) consistency teachers have in teaching ^ 
different skills to the saie students, and (b) consistency in 
teaching the saie skill to different students • The data were 
collected fron 69 fifth^grade teachers. Student achievement vas 
tested ip October and April in two consecutive years on the 
Intermediate Battery of the Hetropolitan Achievement Test (HAT: 1959). 
Adjusted gain scores were computed^ based\ on class means, and the 
gain vas used as an index of relative teaqher effectiveness. The 
following three assumptions are implicit in the use of these gains: 
(a) the HAT is a relevant index of student performance, (b) gain 
scores measure teachers* deliberate behavio^ and variables beyond 
control of the teacher, and (c) students in average or below-average 
classes may learn considerably during the year, although in 
comparison to other classes they may have learned less* Be suits of 
the study also indicate that teachers are /not found to have a 
_^coasiatent effect on the spread of achievement scores in their 
classes. (Author/JS) 

- / 

/ 

i 

\ 

\ 

\ 

♦♦♦♦♦♦♦♦ ♦♦♦f ♦♦♦♦♦♦♦♦ 

♦ Documents acquired by EBIC include many informal unpublished 

♦ materials not available f^om other sources. EBIC makes every effort ♦ 

♦ to obtain the best copy available, nevertheless, items of marginal 

♦ reproducibility are often encountered and this affects the quality ♦ 

♦ of the microfiche and hardcopy reproductions EBIC makes available ♦ 

♦ via the EBIC Document Beproduction Service (SDRS). EDBS is not ♦ 

♦ rejsponrible for the quality of the original document. Beproductions ♦ 
^ supplied by EDBS are the best that can be made from the original. 

♦♦♦♦♦♦♦♦ ♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦ 



O 

0S A STUDY OF TEACHER EFFECTS 

^ BASED ON STUDENTS' ACHIEVEMENT SCORES 



us O«^A«T«ENT0f HfcALTM- 
CDUCATIOM « WCLFAKC 
HATIOHAL mSTITUTC OF 
SDUCATIOM 

THIS DOCUMENT HAS SEEN REPRO 
DUCED EXACTLY AS RECEIVED FROM 

THE PERSON OR ORGANIZATION ORIGIN ' 
ATlNGtT POINTS OF VIEW OR OPINIONS 

STATED CO NOT NECESS^RILY REPRE / 
SENTOFFICIALNATIONA. INSTITUTEOF f 
EDUCATION POSITION Ok POLICY 



HENRY ACLAND 



1975 



5 

er|c 



n 



Introduction 



There is a widely shared view that some teachers 
yield better results than others. It is not just that 
some are more skilled at teaching but that their abilities 
have a degree of constancy. This is reflected in the 

implicit asstimptions of hiring and promotion policies. 

/ 

e teacher is chosen r and cinother turned down, because ^ 
the past teaching records of both are thought to predict 
their future performances. The same beliefs underlie 
training programs that are designed to teach .f^ompetencies 
that make graduates of the program consistently more eff- 
ective than the recruits. Finally, the belief in variation 
in teacher effectiveness underlies research on teachers 
which has sought to identify the behavioural correlates of 
effective teaching as if the prior problem of establishing 
the existence of variation in teacher productivity had been 
dispensed with. The research reported here tests the evidence 
on which such beliefs in teacher effectiveness might rest, 
by examining the importance of their effects. 

Several different definitions of teacher impact are 
used so it is essential to separate and clarify them at the 
outset. The definitions rest on a set of distinctions, the 
most important of which is the distinction between the stable 

\ 



and unstable components of teacher impact. The stable 
component of the consistent t6acher effect can be measured in 
two ways: the consistency teachers have in teaching different 
skills to the same group of students and their consistency 
in teaching the same skill to different groups of students. 
The first way of looking at consistent effects examines the 
degree to which a teacher who is good at teaching one kind of 
ccmpetence, say in reading, is also effective at teaching 
another, say arithmetic computation. The second way of 
looking at consistency examines the degree to which a 
teacher who is effective in one year, with one class of 
students, is effective in the following year with a diff- 
erent group of students. It is this seqond way of defining 
teacher effects that gets most attention here. The reason 
for this is that the second definition corresponds to the 
implicit belief, referred to above, in consistency over 
time in the way teachers contribute to student learning. 

Stable teacher effects are distinguished from unstable 
effects. The unstable effect is defined in temporal terms 
as the component which varies from one year to the next. It 
is the teacher effect which is specific to a given period, 
here the school year, and which cannot be attributed to 
that teacher's stable effectiveness. If this effect is 
found to be considerable it would imply that teacher effects 
vary from one context to another f and this opens up the 
question of the nature and determinants of this context. 



For example, it might be that the student composition of the 
classro^om affects the teacher's style, and hence the teacher's 
effectiveness. Alternatively, teachers may alter their tech- 
/ niques independently of the students they teach • Such 

possibilities suggest sets of hypotheses about the way teachers 
operate, but these questions lie outside the scope of this 
paper which is concerned with the simplefr problem of the 
importance of the unstable effect. 

Cutting across the distinction between stable and 
unstable teacher effects is the distinction between effects 
on the class average score and effects ,on the spread of scores 
within the class. Analaysis of teacher effects on the class 
average implies a concern with the averagej&feifts in the 
performance of the students in the teacher '*s class. By 
contrast, analysis of the spread of scores, measured by the 
standard deviation or the variance, implies an interest in 
the degree to which teacher increase or decrease the disp- 
arities in the performance of students within their class. 
More attention here is given to teacher effects on class mean 
scores than their effects on the variance. Once again, this 
is because the prevalent conception of the difference between 
an effective and less effective teacher is that. the former - 
is successful in improving the overal"* performance of the 
class. Perhaps the most obvious way of measuring the overall 
performance of the class is with the class* average score. 
A less common definition of an effective teacher is one 



erIc 



5 



-4- 



who has a differential effect on students such that the a 
spread of scores in a class increases or decreases over 
the course of the year. This definition leads to exarain- 
ation of teacher effects on the dispersion .of students' 
scores • 

_ Litte research has adoptipd the frainev/ork used in this 
paper. Several shprt-term studies of teacher consistency 
have been carried out, but only four long-term studies 
were located, where the teacher's contact with the students 
was longer than four weeks. All have concentrated on 
teachers* impact on class average scores. Three of these 
:udies (Morsch et al 1955 , Harris et al 1968, Soar 1966) 
lave been xfeviewed by Rosenshine (1970) and his summary 
is little embellishment: 

"...on the basis of these studies, evidence on the 
consistency of teacher effects is weak because corr- 
elations as high as .5 were obtained in only one study 
and all other corre la t ions \ were aJDOut .35 or much 
lower . " 

The fourth study, by Brophy (1973), looked at the 
stability of teacher effects across three years and found 
correlations "generally higher than those obtained in the 
long-term studies reviewed by Rosenshine." Median correlations 
for the four subgroups of teachers lie between .25 and .42. 
These studies are congruent and imply a modest stability of 
teacher effects across years. However, it will be argued 
^ in this paper that the correlation by itself does not really 
indicate the importance of the stable teacher effect. This 



study is also justified because it seems likely that conclusions 
about teacher effects wili be pieced together from small studies 
rather than deriving from large-scale surveys. The main reason 
for this is the complicated student testing program that is 
required for the research design* Given this practical problem, 
research on teacher consistency will probeibly be based on several 
small, and possibly unrepresentative samples of teachers* 

1 



J 



Method 

The data were collected from 89 fifth grade teachers, 
who form a systematic selection of all fifth grade teachers 
in a large school system. Teachers were included only if they 
taught at the same grade level in the same school in two consecu- 
tive years and if the school's fifth grade classes were self- 
contained. Where students were taught by more than one teacher 
the school was dropped. It is assumed in this analysis. that the 
variation among xthese 89 teachers bearS scxne resemblance to the 
variation that would be found in samples of larger populations. 

Student achievement was tested in this school system in 
October and April in tv;o consecutive years on the Intermediate 
Battery of the Metropolitan Achievement Test (1959) . Different 
forms of this test were given. in fall and spring. The results 
of the tests were obtained for each of nine MAT subtests for e^ch 
student. The main . analyses of this paper are based on class 
average achievement scores which are calculated for each subtest-. 
For each subtest there are (at least) ten possible ways of 
deriving an average score. 

In the first place, average scores can either be calculated 
on the basis of "matched" students or on the basis of "unmatched" 
students. Matched students are those who took a given subtest 
in both fall and spring of a given year; the unmatched group is 
that which was tested on one or the Qther of these occasions, 
either fall or spring, but not necessarily both. The second 



-7- 



source of complication is the variety of metrics which can be 
derived from the raw scores. Only two are considered here: 
piiblisher*s standardized and grade equivalent scores. Following^ 
the procedures laid down in the publisher's manual, raw scores can 
be converted into nationally normed standardized scores and these 
standard scores can b'e transformed into grade equivalents* The 
third source of complication concerns whether scores are trans- 
formed before or after aggregation. For instance, pupil-level 
raw scores can be .transformed into standardized and grade equivalent 
scores and stibsequen^ly aggregated to give a class average on all 
three metrics. Alternatively, the raw scores cem be aggregated, 
to give a class average raw scores, and this average score can/ 
be transformed into the standardized and grade equivalent scores. 
These ten possible routes to an aggregate score are summarized in 
the diagram: 

Matched Unmatched 

Raw scores 1 2 

Standardized scores 3 4 

Grade equivalent 
scores 5 6 

1,2 = no transformation possible, one class average score computed 
in each cell; 

3-6 « transformation from raw score into metric can occur before or 
after aggregation of data. Two class average scores computod 
in each cell. 

9 



ERIC 



/ 



[^was iniportant to see if different aggregation routines 
altered the analysis. To .test the proposition that it would 
' make a difference, the class average -fall scores set out above 
were correlated to examine differences in the relative ordering 
of classes. In the first place, the effect of using different 
metrics was examined by correl^ating raw score averages, with 
Standardized score and grade equivalent averages both the latter 
being transformed prior to aggregation. Correlations were 
calculated for each subtest between each type of metric. They 
were all extremely high and suff icien:tly close that it is un- 
— -necessary to report each correlation. The mean correlation 

between raw and standardized averages is 0.d97, between raw and 
grade equivalent averages, 0.984 ard between standardized score 
and grade equivalent averages, 0.9^0. Since the cla averages 
are so closely associated any one ^tric could be substituted for 
the other; in fact the standardized m^ric was chosen because it 
is easier to make comparisons across subtests. 

The second concern was whether the point of aggregating scores, 
ejf^er before or after transformation, would make a difference to 
the ranking of classes. Again, correlations were computed between 
scores that had been transformed and subsequently aggregated and 
between aggregated sqores which had subseqitently been transformed. 
Here too the correlations (were high, averaging 0.994 for standard- 
ized scores and 0.982 for grade equivalent scores. 

The final question was whether the class averages should be 
based on matched or unmatched groups of students. There are 
obvious reasons for preferring the matched groups, but an argument 



ERLC 



o 



could be made for using unmatched groups if the increase in the 
niunber of students improved the reliability of class average scores. 
In fact, the correlations between matched group and unmatched 
group means were suffici^tly high for this decision to seem 
unimportant. For raw scores the average correlation between »i 
matched and unmatched averages across subtests is 0.990, for stan- 
dardized scores, 0.990 and for grade equivalent scores 0.987. 
As a result of these analyses, only standardijzed average 

scores derived from matched, groups, where the standardization 

preceded the aggregation of data, will be used in the rest of 

\ 

the paper. j 

For the analyses concerned \^ith teacher effects on class 

mean scores, adjusted gain scores are computed, based on the 

class means, for each teacher in Y^ear.l and again in Year 2. 

This gain is used as an index of relative teacher 
effectiveness. Effectiveness is here defined as the aitjount by 

which the class average scores in the spring exceeds the level 
that would be predicted on the bai^is of the class average score 
in the fall. It is a relative measure in the sense that it com- 
pares one teacher's effectiveness to the .average effectiveness of 
this group of teachej^s. A class^ adjusted gain is the difference 
(1l - y) , where ^ is equal to a + b.Xf^^l* expression, 
a is the intercept and b the regression coefficient from the 
regression of spring on fall scores of a given subtest. Only 
one independent variable was used since it was found that use of 
more independent variables did not seriously affect the main 
analyses (see Tables 2 and 3) . 

\ 

\ 



There are three assumptions implicit in the use of these 
gains. First it is assumed that the HhT is a relevant index of 
student performance. The study can be criticized on the grounds 
that these teachers were not trying to improy^skills which the 
MAT measures, but this issue cannot be resolved without investi- 
gating these teachers' Igoals directly. The validity of the 
MAT for the purposes of this stxidy is a matter of judgement. 

Second there is uncertainty about designating the 
gain score as measure of teacher effectiveness. The- 
adjusted gain sCpres may measure the consequences of 
teachers* delibcfrate behavior; however, other influences may be 
at work which ape associat^^d with the teachers, but which are 

not under the teacher 's control. For example, gkins may reflect 

/ 

uncontrolled school level variables such as the presence of a • 
particular curricular model or they could be due to aggregate 
differences in home background siach as the level of parental en- 
couragement. If factors such as these play a part it would lead 
one to a conclusion, directly opposed to Brophy's, that the 
adjusted gains are best regarded as maximal, not minimal esti- 
mates of teacheri^ effects. However, this conclusion should be 
offset against the observation that the MAT, like other \tandar- 
dized tests, is constructed in a way which makes ^t insensitive 
to the unique effects of different teachers. Standardized tests 
are designed to be fair in the sense that they test skills which 
all students could be expected to h^e had the ch^yice to learn. 
This lessens' the chance that students who have learned particular 
things in uncommon situations, a special program perhaps or an 



unconventional teacher, will stand out above students who were 
exposed to more cogiventional situations. A different kind of 
achievement test could well give different and possibly higher 
estimates of teacher effectiveness. 

I Third, students in average or below average classes may still 
learr^ considerable amounts during the yea' i x 'lough in ccppdrison 
to 6t!her classes they have learned less. Teachers presumably 

* 

have absolute effects in addition to having relative effects to 
one another. Unfortunately, these absolute effects could not b^ 
measured with these data. 



Analysis 

1) Consistency across subtests within years 

The first four sections of the analysis are concerned with 
th ^-ct teachers ha^e on the class average score. This part 
looks at the degree to which class adjusted gains based on class 
mean scores of a given subtest of the MAT are consistent with 
gains measured on other subtests during the same school year. / 
If high levels /Of consistency are found it would suggest that 
teachers who are effective in increasing a class' average score on one 
skill are also effective at teach4.ng other skills measured by 
the nine MAT subtests or, alternatively, that the nine subtests 
are measuring essentially the same kind of competence and this single, 
competence is being influenced by the teacher. The credibility of 
the second of these alternatives can be tested by factor analyzing 
the nine MAT subtests derived from each of four different testing 
occasions: fall and spring of Year 1 and fall and spring of Year 2. 
Class mean scores are used in each analysis. If the first 
principal component accounts for a large proportion of the total 
variation in the nine subtests it would be reasonable to conclude 
that the subtests measure a common skill. In the four factor 
analyses the first principal cojnponent accounts for between 
82.0% and 84.5% of the varianc^^-a relatively higb percentage. 
This means that if the within-^ear correlations among adjusted 
gains^re high, the consistency could be attributed to the 
jeommon^lity cf the nine MAT subtests. However, the inter-correla- 
tions of the adjusted gain scores of the subtests, presented 
separately for Year 1 and Year 2 in Table 1, are only moderate. 



TABLE 1. Zero order correlations among adjusted gain scores, 
v/ithin ybars. Based on standardized scores for 
raatched grqups, transformed before aggregation. 

'; . ■ • 

•i '' • ■ . ' ■ '' ■ 

.... 
. i YEAR 1 



LG 
LS 
AC 
AP 
SS 
SK 
SC 



T;K RD \; LG LS AC AP SS SK 

0.:i99i'*- C.267* .iJ..0.498***Vb J44C*H „ " * 

CaSO-.'* 0.387**,i^ 0.419*** C.n62*«*'0.56'i«** . 

0.485*** 0-265* / _ 0.345«* 0.293**';. C. 47 C*** 6'r55&*?'« 

0.5U0***. 0.273*/ 0.311** 0.340** d.296* 0.308** 0.35"4**- 

P«cl2***iC.2e7<(*._ 0.361*** 0.327** ..O..i^LZ*** C. 435***_0..,645*** iJ^357Ji5*< 

' Correlations based on minimum- of 70 casei; • 



YEAR 2 



RD 

LG. 

LS 

AC. 

AP 

SS 

SK 



SS 



SK 



WK RD LG LS AC AP • 

0.365**«\ ' ' ■ 

0.376*** 0.437*** • . * 

0.459*** 0.568*** -U. 52l**« . " 

0.487*** 0.43 2***- 07506*"*"* 

0.511*** 0.52;*^v 0.446***' 0. 639*** • . ' , 

0.770*** 0.4 j5*-:'- 0 .609*** 0. 468***:'0. 58 1*** 
C.4c9*** 0.473***0-415*** 0.328** 0.526***- 0.603*** 
0.323** 0.292** 0-512*** . 0. 37 l«*5L.q. 51 1*** . 0.4 1 1*** ..0-^1.7*** 

Correlations- based on minimum of 84 -^ases 



0.292** 
0.229* 
0.301** 
0.273* 



SC-.-,0.2$5** 



WK = Word Knowledge, RD - Reading, LG = Language, LS = Language 
Study Skills, AC = Arithmetic Computation, AP« Arithmetic Problem 
Solving, SS = Social Studies Information, SK = Social Studies 
Study Skills, SC = Science 



1^ 



-14- 



The average correlation among'^e adjusted gains^ in Year 1 is 
0.40 and in Year 2, 0*46.* 

The size of these correlations is related to the reliability 
of the gain scores. If the gains can be shown to have an appre- 
ciable error component the correlati^s will be under-estimated. 
To exaggerate the possible/size of this error, the reliability of 
the class mean scores was estimated by substituting the most con- 
servative values for the MAT subtests in Shaycroft's formula 
(Shaycroft, 1962). The estimated reliability, 0.96, is conse- 



quently the lowest estimate that could be obtained for these 
subtests and class sizes. About 60% of the variance in the spring 
class means can be accounted for by the fall mean scores, which 
leaves 40% of the original variance containing all the error of 
those scores. The proportion of this error to the residtial var- 
iance is the estimated reliability of the adjusted gain scores. 
In this instance, the error variance in class mean scores is 4% 
of the total and the residual variance is 40%, indicating a re- 
liability of 0.90 for the gains ^ - Bearing in mind that this 
estimate is conservative, it seems unlikely that correcting the 
correlations reported in Table 1 will change them substantially. 
In view of this and the fact that the subtests load heavily' 
on a single principal component^ it is reasonable to concli^de 
that teachers could have differentiated effects on student' learning 



♦These correlations appear sliqhtly lower' than Brophy^s wxthin- 
year correlations. For Exophy's 12 teacher subgroups- the average 
within-year correlations range between 0.29 and 0.71. / 



ERLC 



-15 



A good matliematics teacher is not necessarily good at teaching 
language skills. However, the fact that all the correlations 
are positively correlated indicates that there is some corres- 
pondence in the degree to which teachers' classes change above 
or below the average rate in different tested skills. 

2) Consistency across years within subtests 

The central analyses of this paper are concerned with the 
teachers* consistency across time. This consistency is measured 
by the inter-year correlations for the gain scorers of the different 
Siib tests (see Table 2} . A majority of these correlations are 
statistically significant,* and the median correlation, 0.398, 
compares well with Brophy's median correlation between successive** 
annual gain scores of 0.39. 

These correlations vary considerably across subtests, although 
the variation is not as large as found in Brophy's study where the 
range for successive annual gain scores is -0.12 to 0.78. There 
are two explanations of the subtest variation. First, as a conse- 
quence of different psychometric properties, some of th^ MAT sub- 
tests may be better at measuring the stable component of teacher 
effects than others. Straightforward examination of the MAT did 
not reveal any obvious differences between the subtests, but a 

*Thc statistical significance levels are reported even though they 
are not strictly noaningful when, as in this case, teachers have 
not been randomly selected. 

**This imolios Year 1-Year 2 correlations and Year 2-Yoar 3 
correlations, but excludes Year 1-Year 3 correlations. 



-16- 



TABLE 2. Zero order correlations between Year 1 
and Year 2 adjusted gain scores* Based 
on standardized scores for matched groups # 
scores transformed before aggregation* 



Subtest 

Word Knowledge 

Reading 

Language 

Language Study Skills 
Arithmetic Computation 
Arithmetic Problem Solving 
Social Studies Information 
Social Studies Study Skills 
ScienC^ 



r 


N 


.488*** 


81 


.198 


82 


.398*** 


80 


.132 


83 


.405*** 


82 


.457*** 


80 


.433*** 


83 


.310** 


73 


.228* 


83 



I 



-17- 

simple analysis of this kind is not definitive. Second, the 
impact teachers have may be more stable in some araas of achievement 
skills than it is in others. This speculation compounds the ear- 
lier finding (Table 1) wlhich showed that gains were only moderately 
correlated within years, and suggests that teachers' effects are 
related to the particular achievement test that is used to measure 
student learning. 

A caution is in order. The correlations are circvimstantial 

\ 

evidence of a stable teacher effect. They imply the existence 
of teacher behaviors which are stable and which have consistent: 
effects in successive years, but it must be remembered that th^f^ 
behaviors have not been identified nor have they been observed 
directly. As mentioned above, effectiveness could be related to 
an effective curricular model so that teachers who use it are ' 
found to be consistently more effective in comparison -with other 
teachers who do not use that curricultxm. The same applies to 
other class-level factors. Therefore, it is important to regard 
these findings as a tentative indication of a stable teacher 
effect rather than a proof that some teachers are superior to 
others as a result of their classroom practices. It is important 
to bear this proviso in,, mind in the follqwing analysis, which 



treats the data as if the consistency measured by the year-to- 
year correlations could be attributed to l^e teachers' influence. 



\ 



ERIC 



-18- 



, 3) The Size of Teacher Effects 

\ 

The analyses in this section are concerned with the practical 
importance of teacher effects. This will be assessed in two ways; 
in terms of achievement test units and in relation to the pupil- 
level variation it test scores. 

The first way of expressing the size of the stable component 

> \^ 

of teacher effects ip in terms of the achievement test score units. 
This requires consideraition of both the correlations reported in 
Table 2 and the variances of the adjusted gain scores, since the 
correlations alone dorfiot indicate the practical impact of con- 
sistent teacher effects. If there is little variation among 
teachers in terms of their relative effectiveness such that the 
best teacher is not so different from the worst, then the evidence 
of consistency will assume less importance. Conversely, the larger 
the variation in teacher effectiveness, and the more consistent 
teachers are, the larger their overall impact on student learning. 
The method of estimating the size of teacher effects depends 
" on the assumption that Year 1 and Year 2 adjusted gains are im- 
petfect measures of the true differences between teachers. These 
differences are defined as their ai>ility to consistently change 
the Werage level of achievement in their classes above or below 
the predicted level. Seen this way, the square root of the corre- 
lation between Year 1 and Year 2 gains is an estimate of the 
correlation between the true, unmeasured teacher consistency 
variablV and the observed, adjusted gain scores. Thus it is 
possible to estimate the proportion of the variance in the adjusted 



Q A. - 



ERIC 



-19- 

gains that can be attributed to true differences in the consis- 
tent component of teachers* influence. The analysis is summar- 
ized in Table 3. For .the Word Knowledge subtest, the inter- 
year correlation is 0.488, and the average standard deviation of 
the adjusted gain scores is 3.09 test points. The product of 
the square root of the correlation and the standard deviation 
(3.09 X 0.698) gives the number of test points associated with 
one standard deviation difference on the underlying teacher 
consistency measure. The estimated effects, reported in Column 
3 of Table 3, are concrete in the sense that they suggest how much 
student achievement can be attributed to the stable element 
of teachers' impact. For instance, a contrast between the average 
teacher and the teacher at the 84th percentile on the distribution 
of the unmeasured teacher effect variable is associated with 
2.16 test score points on the Word Knowledge test; a more extreme 
contrast, say between the average teacher in the top and bottom 
fifths of the effectiveness distribution (2.8 standard deviations) 
is associated with a difference of 6.05 test* score points 
(Column 4) . The average effect associated with the top and bottom 
fifth contrast, 5.34 achievement test points, implies that teachers 
can have important consequences for the amount students learn, '"y^ 
Some teachers are not only consistently better than others but 
their practical effects make an appreciable difference to the 
average student in their classes. 

The second way of expressing the importance of teacher 
effects is based on decomposition of pupil level variance in 
spring test scores. There are two components of this variance 

ERLC 



-20- 



TABLE 3* Estimates of the size of teachers', impact • Based 
on standardized scores for matched groups, scon 5 
transformed before aggregation* 

Column 1 : Correlation between Year 1 and Year 2 adjusted 

gain scores (see Table 2) 
Column 2 ; Average standard deviations for Year 1 and Year 

2 adjusted gain scores 
Column 3 ; First estimate of teacher effect • The test points 

associated with one standard deviation difference 
-on the underlying measure of teacher effectiveness. 
Column 4 ; S6cond estimate of teacher effect. The test points 

associated with the contrast between top and bottom 

fifths of teachers on the underlying measure of 

teacher effectiveness. 



Subtest 

Word Knowledge 

Reading 

Language 

Language Study Skills 
Arithmetic Computation 
Arithmetic Problem Solving 
Social Studies Information 
Social Studies Study Skills 
Science 



1. 


.2. 


3. 


4. 


.488 


3.09 


2.16 


6.05 


.198 


3.44 


1.53 


4.28 


.398 


3.81 


2.40 


6.73 


.132 


4.09 


1.49 


4.16 


.405 


3.54 


2i25 


6.31 


.457 


2.76 


1.87 


5.24 


.433 


3.32 


2.18 


6. '10 


.310 


3.11 


1.73 


4.85 


.228 


3.23 


1.54 


4.32 



-21- 



which are crucial to the analysis. The first is attributable 
to the stable teacher effect, the second to the unstable teacher 
effect. This can be explained by reference to the ANCOVA design. 
Teachers and Years are defined as two factors in a crossed 
design and students are nested within each Teacher-Year cell. 
The dependent variable is the spring score for the student and 
the covariate his fall score for the same subtest being used for 
the dependent measure. The percentage of student variance that 
can be assigned to the main teacher effect is called the stable 
teacher effect; it is that part which is consistent from one year 
to the next. The second component of variance / the unstable effect/ 
is that which can be attributed to year-specific effects. It 
is the part of the variance assigned to the interaction te'rm 
(Teachers x Years) . This is also a teacher effect, being the 
part of their effect which is variable from one year to another. 
There are several reasons to expect teachers to have such an 
unstable effect. Tor example, they may adjust their instructional 
technique to meet different needs of different 
groups of students and in doing so alter the amount they teach. 
Alternatively, the students in the class may create an informal 
social ambience that makes instruction more or less difficult in 
a given year. As the composition of the class changes so may 
the teacher's effectiveness change. This part of the analysis 
seeks to identify the unstable teacher effect and compare its 
size to the stable teacher effect. 



-22- 

The results of the ANCOVA are summarized in Tabl^ 4, which 
shows the percentage of variance attributable to the main teacher 
effect (Teacher) and the interaction term (Teacher x Years) . 
The consistent teacher effect accounts for an average of 4.76% of the 
student-level variance in spring scores; unstable teacher effects 
account for slightly more: 5 •85% of the variance. Both kinds of 
teacher effects together account for an appreciable proportion of 
the overall student-level variance in achievement scores. 

The results add to those presented earlier by showing the 

relative importance of stable and unstable teacher effects. By 

establishing the provisional evidence for both stcdDle cuid unstable 
teacher effects, the findings suggest that teacher are 

predictable, to some degree, in the effect they have on students. 

Of course, the decision about whether this effect is large 

enough to be educationally significan,t will depend on the 

immediate context of a policy decision and the goals of the 

decision-maker. However, it may be added that since the 

unstable teacher effect is about as large as the stable 

component, there is little reason to select or allocate 

teachers on the basis 6f a belief that teachers are mainly 

/ 

consistent. / 



/ 

/ 



ERLC 



23- 



TABLE 4. Percentage of student level variance ii achievement 
scores that can be attributed to two sources; the 
teacher main effect (stable component) and the 
teacher x years interaction effect (unstable comp- 
onent) . Based on standardized scores for matched 
groups, scores transformed before aggregation. 



Subtest 



Teacher 



Teacher x 
Years 



Word Knowledge 
Reading j 
Language 

Language Study Skills 
Arithmetic Computation 
Arithmetic Problem Solving 
Social Studies Information 
Social Studies Study Skills 
Science 



5.95 


4.75 


3.01 


8.21 


6.63 


..7.17 


3.18 


'.9.57 


6.43 


3.56 


4.53 


'.68 


6.74 


6.83 


3.38 


2.95 


2.95 


4.95 







ERIC 



^ -24- ' 



4) Specially Effective Teachers 

Ir^spection'of the frequency distribution of the adjusted 
gain scores showed that they w6re, positively skewed with a 
small number of teachers scorirt^r wiell over two standard deviations 
from the mean. The question here is whether this small group of 
specially effective teachers was consistently effective between 
years within the same subtests, if specially effective teachers also 
perform- consistently, it is conceivable that the stable teacher 
effect reported in the previous section can be partly accounted 
for in terms of a small numbers of teachers. 

The most direct way of looking nt the part that exceptional *-eachera 
play is to inspect the bivariate plot of adjusted gain scores 
of one subtest for Year 1 and Year 2 (Table 5) . This plot shows 
that there are certainly three, and possibly five teachers who 
stand out from the rest in the upper right hand portion of the 
plot. Plots for other subtests revealed similar outlying points. 
The outlying teachers tend to be consistent as well as specially 
effective. Results of other teacher cr .stency studies have not 
explored the question of outlying data points so the finding 
cannot be corroborated. This is unfortunate since the finding 
suggests an important qualification of the results reported above. 
The specially effective teachers make a disproportionate contri- 
bution both to the variance of the adjusted gatin scores and to 
the size of the between-year correlation. Therefore, the teacher 
effect that has been reported here can be attributed to some 
degree to the existence of small numbers of special teachers. 



ERIC 



Q 

HI 
cri 



CO 

<* 

o 

o o 

>2, 

1^ 

i- ' 

3 ^ 

2. 3 

< a. 

mm* 

M O 

C ^ 

I 

II s 

c 
a. 

(Q 

o 



O 
O 



I 

00 

DO 8 



00 
O 

o 



8 

^^ 

• 

o 
o 

I 

• 

o 
o 



K> 

o 
o 



o 
o 



00 



o 

f 

8 



8 



The implication for future research is twofold. First, it 
is important to know if this finding is dupficated in similar 
studies... Second, in the event that it is, the case could be 
made for special studies of these teachers on the argument that 
effective teacher behaviors would be especially evident in this 
group of teachers # and therefore easier to observe. 

5) Teacher Effects on the Spread of Achievement Scores 

The first four sections of^ this analysis have been concerned 

V 

with the effect teachers have on the average level of performance 
in the class. The average score, and changes in the average, 
can and may be unrelated to the dispersion of achievement. So.^ 
the average scoresof two classes may change in the same way while 

the dispersion of scores changes in very different ways. For 

1 

instance, 'the dispersion might shrink in one class relative to 

the other if the teacher is effective in bringing st^udents 

within a narrower range of scores than they bega^ with. This 

might happen as a consequence of differential attention being 

paid either to the slow or the clever students. Alternatively, 

•the dispersion in one class might increase if the teacher's 

effects are proportional to a student's initial acl^hievement level. 

The question raised in this part of the analysis is whether, and 

to what extent, teachers alter the dispersion of achievement scores. 

Within-class variances are computed for each class on each 

- - y 
subtest for both Year 1 and Year 2. The central tendencies of 

these variances are summarized by their means in Table 6. There 

are three observations to be made about the results. The average 



TABLE 6. 



Average within-class variances ^ by year^ by 
subtest for standardized scores based on 
matched groups, scores transformed before agg- 
regation* 





Pall ' 


Spring 


Difference 


Pall 




44.30 


49.60: 


5.30 


45.34 


RD 


50.59 


57. 9D 


7.40 


49.95 


1.6 


58.61 


64.83 


6.22 


55.33 


LS 


59.44 


72.51 


13.07 


57.03 


AC 


27.87 


' 53.86 


25.99 


26.42 


AP 


35.21 


43.06'" 


7.79 


33.03 


88 


40.72 


43.72 


3.00 


38.92 


SK 


54.24 


59.93 


^ 5.69 


51.96 


SC 


48.88 


• 59.66 


10.78 


48.20 



jyEAR,2. 



spring 

49.32 
60.36 
65.03 
72.81 
54.97 
44.56 
39.87 
57.09 
58.04 



Difference 

3.98 
10.41 

9.7C^ 
15.78 
28.55 
11.53 

0.95 

5.13 

9.84 



WK « Word Knowledge, RD = Reading, LG = Language, LS = Language 
Study Skills, AC =4 Arithmetic Computation, AP = Arithmetic Probler 
Solving, SS = Social Studies Information, SK = studies 
Study Skills, SC = Science. 



-28- 

within-class variance always increases from fall to spring; the 
increase for a given subtest in Year 1 is very similar to the 
increase in Year 2 and, most strikingly, there are substantial 
disparities in the results across siibtests. 

The increase in spread indicates one of three possibilities: 
students with high scores move further from the mean, students , 
with lovf scores move further from the mean, or students near the 
mean move away from the mean.> Since there is no evidence of 
bimodalityin the spring distributions the third alternative 
seems unlikely. But the question remains of what part teachers play 
in this shift. The results only hint at the likely direction of 
teachers' influence; they do not demonstrate to what degree ^ 
teachers are responsible for changes in variance. In addition, 
the wide variation in results for different subtests raises the 
possibility that the psychometric properties of t?h^se subtests 
might accotint for some of the increase in variance. This deserves 
consideration. 

If these tests are generally too difficult for students in / 

the fall, but become more appropriate for^^heir range of achievement 

\ j 

in the spring^ an increase of variance would be anticipated such | 
as that reported in Table 6. If this happens the tests which ; 
have the most marked floor effect in the fall should also show 
the largest increase in variance. To test the possibility, an 
analysis was carried out in which the floor of each subtest is | 
defined, the difference between the average class mean fall scores; 
and the floor for each subtest calculated and this difference ; 
score related to the change in variance over the school year for 



ERIC . 30 



-29- 

> 

the subtest in question. If floor effects explain the increases 
in variance^ then there will be a negative relationship between 
the two difference scores: (average of class means - floor for 
that subtest) , (spring variance - fall variance) . 

For the purpose of this analysis the floor of the subtest 
is defined as the chance score # that is, the average score that 
would be obtained if students checked answers at random. This 
score could not be calculated for the two arithmetic subtests 
which have open-ended items. The difference between the chance 
'score and the average of the fall mean scores is defined as the 
extent to which the subtest has a floor effect. This <lifference 
score forms the X-axis of Table 7; the Y-axis is the difference 
between spring and fall variances. Each subtest contributes two 
points on the plot, one for each year. The two variables 
areVpositively correlated (r = 0.26) . Thus, the hypothesis that 
the tloor effects of the subtests might explain the increases in 
variance is rejected, and this leaves open the possibility that 
some of this increase might be accounted for by the teachers. 

Like earlier parts of the analysis, the focus here is on 
teachers • consistent effects, but the present analysis differs 

/ 

in. looking at teachers* impact on the spread of achievement scores 
rather than changes in the class average scores.. The purpose is 
to establish the existence of a stabile teacher effect on within- 
class variance in the spring while controlling for the initial 
differences among classes in their fall variances. To this end a 
two-way ANCOVA is used in which teacheis and years are the two 
factors. The dependent variable is the spring within-class 



5? or 

O • 

SI i 

<0 1/1 
D •• 

Q O 

< 3" 
Ift o 

Q <o 

5^ 5* 



: < 
o 

5* 

n 



3 
o 

n 



Q 
O 

X 



S <9 



o S 
3 5 



O 

o 



8 
8 

• 

o 
o 



o 
o 



?^ 


>4 

• 

8 


Spring 
t and 


00 




• 

8 


O o 




given 
lance 


9.00 


score 


o 

• 




8 


fx 


11.00 








-31- 

variance of achievement scores and the covariate the fall within- 
class variance. Each cell in the design has a single observation 
on dependent variable and covariate. 

The test of the existence of a consistent teacher effect 
on the variance of student achievement scores is the statistical 
significance of the teacher main effect. The results for each of 
nine subtests are presented in Table 8. The variability of the 
^ results makes it impossible to arrive at a clear conclusion. For 
some subtests (notably LS, AC, AP, SS) , the teacher effect is 
statistically significant, suggesting that teachers make a con- 
sistent difference to the dispersion of achievement scores. But 
these results must be balanced by the non-significant findings 
for the Word Knowledge, Reading and Language sxibtests. Mixed 
•esults like these may simply reflect sampling fluctuations. 
Alternatively, they may be attributed to real effects, in this 
case a selective teacher effect on the dispersion of achievement 
scores which is dependent on the type of test that is used to 
meast^e student performance. However, the task of devising a 
hypothesis to account for a selective effect of this kind is 
formidable. 



ERLC 



TABLE 8. 



the term jised to t«^st the effects. 



SUBTEST 



WK 



i\ms of 
squares d. f 

18910.38 80 

7273.41 



Mean 
square 



RD 



Teacher 
Covariates 
(*)TeacherxVear 

Teacher ^ 41806.43 
Covariates 7677.12 
(*)TeacherxYear 



236.38 
1 7273.41 
317.61 



LG 



Teacher 
Covariates 
(*)TeacherxVear 



40852.10 
7837.22 



81 516.13 
1 "7.677.12 
532.89 

79 ' 517.12 
1 78^7.22 
427.17 



F- ratio 

0.744 
22.90*** 



0.969 
14.41 *** 



1.211 
18.22 ** 



LS 



Teacher 78884.69 
Covariates 11446.11 
(*)TeacherxYear 



82 962.01 1.795** 
. 1 11446.11 21.356**' 
535.98 



kc Teacher 97831.69 81 1207.80 

covariates 10771.84 1 10771.84 

* 

(*) Teacher xYear 



388.01 



AP Teacher 
Year 
1*1 TeachcrxYcar 



40580.31 
2584.91 



79 513.68 
1 2584.91 
264.58 

34 



3.11 *** 
27.76 *** 

, 1.94 ** 
9.77 ** 



SS Teacher 

Covariates 
(*) TcacherxYear 

SK Teacher 
..Covariates 
(*) TeacherxYear 
« 

SC Teacher 

Covariates 
(*) TeacherxYeat 



36861.97 
'53.15 



46379.47. 
624.59 



57656.82 
1465.49 



82 449.54 
1 53.15 
211.90 

72 644.16 
1 624.59 
407.32 

82 703.13 
1 1465.49 
419.17 



2.12**^ 
0.25 



1.581* 
1.533 



1.677* 
3.496 



V 



/ 



-34-- 



Discussion 

The main finding of this study , which falls in line with 
those of four similar studies, is that teachers have a consis- 
tent effect on the average scores of the- classes they teach 
in different years. They are also consistent in tl^eir effects 
measured on different subtests within the same school year. 

Finally, teachers are found to have a year-specific effect 

' / • 

which the best estimates available show to be about as large 

as the stable teacher effect. Teachers do not appear to have 

a consistent effect on the spread of scores within their classes 

even though these tend to increase during the school year. 

While the general finding of teacher consistency parallels 

earlier findings^, t^ere is one important departure: the discovery 

of specially effective teachers. This is a' startling finding 

^ which cannot be confirmed or disconfirmed /Since data in previous 

studies have not been analysed in the appropriate manner. It 

is important to know if the finding is a quirk. If future 

research showed outliers were a general phenomena^ detailed 

studies of these specially effective teachers would be just- 

ified. On the other hand, if replications support the finding 

of consistent teacher effects, but fail to identify a specially 

effective subgroup, a somewhat different direction can be 

envisaged for future work. The ultimate goal of the research 

should be the identification of the correlates of effective 

er|c 



teacher behaviour. This means defining and isolating the 
attributes of teachers and the nature of the teaching process 
which accounts for variatidns in the adjusted gains of classes. 
It also means identifying ilhe correlates of the stable comp- 
onent: of those gains in the way that stability has been 
defined here. 

0 

References / 



Brophy, J., 1973 Stability of teacher effectiveness. American Educational 
Research Journal, 10(3) ,245-252 

Harris A. J. , Morrison C.,Serwer & Gold L. 1968 A 

continuation of the CRAFT project: comparing .reading 
approaches with disadvantaged urban Negro children in 
primary grades. New York: Division of Teacher Education 
of the City University of New York (USOE Project # 
5-0570--2-12-1) . {ERIC ED 020 297) 

Morsch J. E., Burgess G.G. , & Smith E.N., 1955 Student ach- 
ievement as a measure of instructor effectiveness. Project 
# 7950, Task # 77243, Air Force Personnel and Training 
Research Center, San iVntonio, Texas. 

Rosenshine B. 1970 The stability of teacher effects upon 
student achievement. Review of Educational Research, 40(5), 
647-662 

Soar R.S., 1966 An integrative approach to classroom learning. 
Temple University, Philadelphia,, Penna (ERIC ED 033 749) 



♦ 



